Gauss curvature-based unique signatures of individual large earthquakes and its implications for customized data-driven prediction

Statistical descriptions of earthquakes offer important probabilistic information, and newly emerging technologies of high-precision observations and machine learning collectively advance our knowledge regarding complex earthquake behaviors. Still, there remains a formidable knowledge gap for predicting individual large earthquakes’ locations and magnitudes. Here, this study shows that the individual large earthquakes may have unique signatures that can be represented by new high-dimensional features—Gauss curvature-based coordinates. Particularly, the observed earthquake catalog data are transformed into a number of pseudo physics quantities (i.e., energy, power, vorticity, and Laplacian) which turn into smooth surface-like information via spatio-temporal convolution, giving rise to the new high-dimensional coordinates. Validations with 40-year earthquakes in the West U.S. region show that the new coordinates appear to hold uniqueness for individual large earthquakes (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_w \ge 7.0$$\end{document}Mw≥7.0), and the pseudo physics quantities help identify a customized data-driven prediction model. A Bayesian evolutionary algorithm in conjunction with flexible bases can identify a data-driven model, demonstrating its promising reproduction of individual large earthquake’s location and magnitude. Results imply that an individual large earthquake can be distinguished and remembered while its best-so-far model can be customized by machine learning. This study paves a new way to data-driven automated evolution of individual earthquake prediction.


Gauss curvature-based unique signatures of individual large earthquakes.
If there exists a unique signature before the onset of an individual large earthquake and also if the signature can be detectable and learnable, it can facilitate the prediction of an individual large event's magnitude and location. This study seeks such unique signatures from the Gauss curvatures of the pseudo physics quantities. Since the spatio-temporal convolution process endows sufficient smoothness to the pseudo physics quantities (i.e., released energy, power, vorticity, and Laplacian), they can be regarded as a smooth surface at each depth. In lieu of simple values of the quantities, the Gauss curvature-maximum and minimum principal curvature ( κ 1 , κ 2 )-can be used as informative new features (coordinates) at the event location prior to the onset of a large earthquake. To assess the uniqueness, this study calculates eight-dimensional vector K ∈ R 8 consisting of the principal Gauss curvatures from the four pseudo physics quantities: where subscripts E, P, V and L stand for the pseudo released energy, the pseudo power, the pseudo vorticity's first term, and the pseudo Laplacian's first term, respectively. Eqs. 50-68 in 29 explain the detailed calculation procedure of the Gauss curvatures using the pseudo physics quantities. Figure 2 presents the example plots of the smooth pseudo physics quantities and the associated principal Gauss curvatures near the peak magnitude zone marked by red box in Fig Table 1.
Uniqueness of the Gauss curvature-based coordinates. This study hypothesizes that the Gauss curvature-based coordinates K of the large earthquake are unique in the past 30 years. To prove the uniqueness of K , this study calculates comprehensive K s of all target epochs from 10119 (i.e., December 1989) through 10477 (October 2019). To calculate a K of one target epoch, prior 10 years' earthquakes data are needed (precisely, 119 months) for the spatio-temporal convolution. For instance, the K of target 10119 requires the information convolution using all the prior earthquakes from 10000 (January 1980) to 10118 (November 1989).
As Table 3 summarizes the algorithms, the algorithm first calculates the Gauss curvature-based coordinates K at the center points of the entire reference volumes in all epochs (Step-I in Table 3). At the beginning of Step-II of Table 3, K (k) , (k = 1, . . . , 8) at the hypocenters of the eight large earthquakes (Table 1) are calculated as a reference coordinate set. The number of reference earthquakes is 8 since only they meet the magnitude criterion ( M w ≥ 7.0 ), within the 30-years time frame and in the West U.S. region under consideration. Then, a comprehensive comparison follows. In each target epoch, entire K s of all reference volumes are compared to the eight reference earthquakes' K (k) , (k = 1, . . . , 8) in terms of L1 norm. This comparison quantitatively asserts whether there is a similar K to any known large earthquake's K . Figure 3A confirms that the eight large earthquakes' K appear to be unique, at least within the 40-year earthquakes used in this study. Results confirm that there are no events similar to the eight large events in terms of the Gauss curvature-based coordinates. Importantly, the L1 norm of K distance between any two individual large earthquakes is noticeably large (Fig. 3B), i.e., ( ||K − K|| 1 > 0.1 ). And the large eight reference earthquakes remain uniquely distinguishable by their Gauss curvature-based coordinates. Still, it is premature to generalize this finding, calling for in-depth future investigations. However, the unique representation of individual large earthquakes via physically meaningful new coordinates may cast a new light on the "customized" prediction of an individual large earthquake as explained in the following section.
(1)   www.nature.com/scientificreports/ Implication to customized prediction of individual large earthquakes. The best-so-far identified rule of magnitude prediction.. In hopes of being independent of any prior knowledge of existing magnitude prediction models [4][5][6]30 , or earthquake forecasting methods [5][6][7][8]10,[12][13][14][15]31 , this study seeks to use a purely datadriven prediction model customized for individual large earthquakes. To this aim, this study developed a Bayesian evolutionary algorithm, of which overall architecture is illustrated in Fig. S15. The central notion in Fig. S15 is in alignment with the author's recent application to the identification of hidden models behind nano-scale phenomena 32 as well as complex heterogeneous structures 33,34 . In pursuit of the data-driven individual prediction model, many possible candidates of physics quantities are explored by the developed framework: the pseudo released energy E r and many forms of physical variants of E r including the three components of the spatial gradient vector ∇ g E (t) r , the local maximums and minimums of ∇ g E (t) r , the time derivative ∂∇ g E (t) r /∂t meaning the power, the pseudo vorticity (Eq. 12), and the pseudo Laplacian (Eq. 14). From the comparative investigations, the best-so-far prediction rule identified by the Bayesian evolutionary algorithm suggests including the following pseudo physics quantities: (1) the pseudo released energy (the corresponding best-so-far cubic regression The pseudo released energy ( E r ), the pseudo power ( ∂E r /∂t ), the first term of pseudo vorticity ω , and the first term of the pseudo Laplacian ∂ 2 E r /∂ 2 . Red box indicate the ±1 • zone near the peak magnitude ( M w = 7.1 ) covering (−116.25 • ± 1 • , 34.65 • ± 1 • ) at depth 12.5 km; (E-H) Enlarged plots of the peak zone (red box in A-D) and calculated principal Gauss curvatures (κ 1 , κ 2 ). www.nature.com/scientificreports/ spline (CRS) LF is denoted by L E ), (2) the power, i.e., the time derivative of the pseudo released energy ( L P ), (3) the pseudo vorticity of the pseudo released energy flow ( L ω ), and (4) the pseudo Laplacian ( L L ). The best-so-far rule of magnitude prediction is identified as the multiplicative combination of these CRS LFs of these physics quantities as where E * (t) r is the best-so-far pseudo released energy at epoch t and at the reference volume ξ j . The free parameters associated with the best-so-far CRS LFs L E , L P , L ω , and L L are denoted by θ E , θ P , θ ω , and θ L , respectively. All terms of the pseudo physics quantities in Eq. (2) are with respect to the geodetic coordinate system, and the details regarding definitions and derivations are presented in "Methods" section. Sg(.) stands for a typical sigmoid function, Sg(x) = 1/(1 + e −x ) , for brevity. The power term's LF L P uses the sigmoid function to transform ∂E (t) r (ξ j )/∂t ∈ R[−∞, ∞] to R(0, 1) which is compatible with the input range of CRS bases. A slightly modified sigmoid with a scaling-up factor e 2 is used since it appears to outperform against a typical sigmoid case. This scheme applies to the pseudo vorticity's LF L ω since ω h ∈ R[−∞, ∞] . Amongst many candidates for L ω , e.g., ω , ω φ , ω h , or ω 2 + ω 2 φ , comparative investigations suggest that ω appears to give the most plausible performance, as finally included in Eq. (2). Physically, ω may describe the slow rotational motion of the energy flow about the longitudinal axis. This study's training data are from the West U.S. region of which plate motions and the known major faults are roughly parallel or normal to the longitudinal axis. This coincidence may underpin the relatively important role of ω in the identified rule of magnitude prediction. As mentioned earlier, the bestso-far prediction model in Eq. (2) includes the separate term of the pseudo Laplacian ∂ 2 E * (t) r ∂ 2 . Compared to the predictions using other separate terms, ∂ 2 E * (t) r ∂φ 2 , ∂ 2 E * (t) r ∂h 2 or the resultant pseudo Laplacian ∇ 2 g E (t) r (ξ j ) (here, subscript g indicates the geodetic coordinate system-based derivatives), the prediction only with ∂ 2 E * (t) r ∂ 2 appears to Table 1. Gauss curvature signatures of the eight large earthquakes, M w ≥ 7.0 (Subscripts E, P, L, and V correspond to the pseudo released energy, the pseudo power, the pseudo Laplacian's first term, and the pseudo vorticity's first term, respectively). (2) is due to the large range 10 4 ∼ 10 5 of the calcu- This is another interesting coincidence with the best-performing role of ω of the pseudo vorticity. The regional characteristics of the West U.S. plate's motions may be the potential rationale behind this data-driven finding. Figure 2A-D present example plots of the smooth surface of the pseudo physics quantities. As expected, the distributions of the pseudo physics quantities are noticeably different from each other and preserves smoothness, which facilitates the calculation of the geometric quantities such as Gauss curvature.
Feasibility test results of individual large earthquake predictions. To conduct a feasibility test of the customized data-driven prediction of individual large earthquakes, this study used the ML-identified best-so-far model given in Eq. (2). It should be noted that the proposed ML-identified prediction model uses the same set of pseudo physics quantities as that is used in the generation of the Gauss curvature-based coordinates. The role of Gauss curvature-based coordinates is to offer a unique signature to individual large events, not directly to predict location and magnitude. The subsequent prediction model may use the same set of pseudo physics quantities; this study denotes such a prediction as the "associated" prediction model. Or one may choose to adopt other advanced prediction methods using other ML methods (e.g., [22][23][24], being independent of the pseudo physics quantities used in the Gauss curvature-based coordinates; this study denotes the general predictions as the "unassociated" prediction models. In this context, the present one is "associated" prediction model. The best-sofar prediction model is applied to the large reference events ( M w ≥ 7.0 ) in the West U.S. region (i.e., longitude and latitude in (− 130, − 110) and (30,45) [deg], respectively, and depth (− 5, 20) [km]) from 1990-2019. The model uses the observed 10-year data, 30 days before the event without any physics mechanisms or statistical laws. The best-so-far model appears to be successful in reproducing the next-month earthquake's location and magnitude as shown in Fig. 4. In some cases, the ML-identified model appears to reproduce the global peak event noticeably well with little false peaks (e.g., Fig. 4C-D). In other cases, the ML-identified rules reproduce reasonably the global peak's location and magnitude with a few false peaks (e.g., Figs. 4A,B,E,F). Table 2 summarizes the prediction results of individual eight large earthquakes using the best-so-far data-driven prediction model. Individual event's location and magnitude are reasonably reproduced by the customized data-driven model.

Discussion
A high-level analogy of the pseudo quantities to the Helmholtz's theorem. The data-driven prediction selects out the pseudo physics quantities, i.e., pseudo released energy, power, vorticity, and Laplacian, as new important features in terms of the Gauss curvatures. The new features inherit the surface-invariant strength of the Gauss curvatures, being independent on how the surface is embedded in 3D space. It is physically clear that the released energy and its time derivative, power, appear to hold importance for large earthquakes. But, a natural question remains-why the pseudo vorticity and Laplacian quantities emerge as new important features? This section suggests a plausible mathematical answer to this question based on a high-level analogy to the Helmholtz theorem-in particular, the Helmholtz decomposition of the three-dimensional vector fields. The Helmholtz decomposition states that a smooth, differentiable, and rapidly decaying 3D vector field F ∈ R 3 can be decomposed into the curl-free irrotational scalar potential ∈ R and the divergence-free solenoidal vector potential A ∈ R 3 as With the assumption of the fast decaying F , the potentials are given by where terms with () ′ is about r ′ . In principle, the Helmholtz decomposition helps elucidate complex vector fields in terms of two physically meaningful irrotational (i.e., ) and rotational (i.e., A ) quantities. If we regard the gradient of the pseudo power as the 3D vector field by F(r) = ∇ g ∂E (t) r (r) ∂t , the pseudo vorticity provides a divergence-free rotational quantity by Eq. (12), Here, the pseudo released energy E r is obtained by a combination of link functions by Eq. (11) which takes the convolved spatio-temporal information index II ST as input. The convolution processes (Eq. 9 and Eq. 10) are re-written in a concise form as . Across the given (spatial and/or temporal) domain, the proximity-dependent integration process is commonly used in the Helmholtz decomposition Eqs. (4)(5) and the proposed Eq. (7). Since ∇ · (∇ × A) = 0 for ∀A ∈ R 3 , the pseudo vorticity appears to highlight the divergence-free rotational information of the vector field under consideration. Now, focusing on the pseudo Laplacian in Eq. (14), and regarding F(r) = ∇ g E (t) r (r) , we have www.nature.com/scientificreports/ As mentioned earlier, E r uses the Gaussian weighting function with exp −|r − r ′ | 2 which is comparable to the distance-dependent decaying function |r − r ′ | −1 in Eq. (4), and also the spatial integration is commonly done in Eqs. (4), and (9). Since ∇ × (∇φ) = 0 for ∀φ ∈ R , the pseudo Laplacian appears to highlight the curl-free irrotational information of the vector field under consideration. Therefore, the proposed usage of the pseudo vorticity and Laplacian can efficiently elucidate and convey the rotational and irrotational physical information of the past earthquakes, at least indirectly. It should be noted that this high-level analogy to the Helmholtz theorem is not the starting point of the adoption of the pseudo vorticity and Laplacian. Reversely, the present datadriven training and prediction process selects out the pseudo vorticity and Laplacian as relatively outstanding quantities-i.e., the inclusion of them in prediction outperforms the cases without them. The data suggests first, and the analogy to the Helmholtz theorem comes later. It appears to underpin the purely data-driven approach of this paper, in lieu of physics principle-driven approach.
Remaining challenges and potential solutions by future research. It is important to remark the two challenges-data scarcity and overfitting. To overcome the data scarcity, one immediate solution would be to release the non-overlapping definition of epochs and shorten the epoch-to-epoch interval to one day from a month. One epoch is still defined as 30-day time range. As explained in Fig. 5D, if the interval between consecutive two epochs is defined by one day, we can dramatically increase the number of total epochs to 14,600 from 480 with the same 40-years earthquake catalog data. Although thorough confirmation is needed with more data sets, Fig. 5A,B support that the uniqueness of Gauss curvature-based signatures of individual large earthquakes appears to hold with the refined epoch definitions. This refinement may add a valuable means to an existing research branch regarding the precursory signals of large earthquakes [4][5][6] . While being unique from other earthquakes' signatures, there appears to exist interesting temporal variations of the Gauss curvature-based signatures as shown in Fig. 5C as getting close to the onset of a large earthquake. It is premature to draw any conclusion from this variation of the Gauss curvatures, but in the future extension the variations may be used for sequenceoriented ML methods 35,36 . For interested researchers, data sets of the refined epochs with one-day interval are shared on 28 . Next, since the extreme earthquakes are rare, the data-driven prediction model most likely suffers from the overfitting problem. A successful prediction model may not generalize to other future earthquakes. But this issue may be tackled by the separation of two tasks-classification of large earthquakes and improvement of datadriven prediction models. With new data accumulate, the unique Gauss curvature-based signatures of individual large earthquakes can help reveal the difference or similarity of large earthquakes by using unsupervised ML techniques. Since an earthquake prediction model is customized for individual event, the separate classification task may inform us which prediction model should be used. Thereby, a prediction model can continue improving as being specialized for a class of large earthquakes, and there can be many different customized prediction models, in lieu of a single model. Moreover, the separate tasks of classification and model improvement can be performed by a high-level ML method such as a reinforcement learning 37 , which may facilitate ML-driven autonomous evolution.
The ML-identified best-so-far prediction model is not the final version but an interim candidate, being subject to substantial improvement and evolution. Since it is purely data-driven, the improvement of the earthquake data sets 22,24,38 will positively influence the prediction accuracy. As long as the reliability and precision of the relevant data is ensured, further inclusion of more physics (e.g., thermal instability 1 , pore pressure 39 , fluid injection 2 ) into the present framework would lead to a positive improvement, which will be straightforward in view of clear interpretability and extensibility of the present framework. There is ample room for further sophistication. It would be beneficial to consider more flexible, versatile bases 40 for LFs, an extensive library of possible mathematical expressions 41 , powerful symbolic regression methods 42 , or stochastic optimizer 43 . Consistent evolution or automated optimization of many hyper-parameters of the framework may be done by inheriting the reinforcement learning paradigm 37 . This study may spark new research directions for seismogenesis. The smooth surface-like convolved information may catalyze the image-based deep learning methods. Gauss curvature-based signatures of large earthquakes may be clustered by unsupervised ML methods for better prediction models tailored for each category. Unique classifications of individual large earthquakes may advance the existing earthquake forecasting methods 5-8,10,12-15, 31 . In light of the multifaceted nature of earthquake phenomena, enabling imminent Table 2. Individual large earthquake reproductions using the best-so-far customized data-driven models. www.nature.com/scientificreports/ individual earthquake predictions will require comprehensible collaborations 44 , and the outcome of this study will promote such a broad endeavor.
In essence, the primary novelty of this paper lies in the new feature generation processes, which first transform raw earthquake catalog data into spatio-temporal convolved information, then transform the convolved information into a number of pseudo physics-based features, and further transform the pseudo physics into smooth surface-like features using Gaussian curvatures. All these new features are then utilized by a transparent ML method (here, Bayesian evolutionary algorithm) to help identify unique signatures and to unravel hidden rules for prediction of large individual earthquake's magnitude and location. Thereby, this paper's outcome adds a new data-and ML-oriented dimension to the existing research paradigm for seismogenesis as well as natural hazards science and engineering.  Table 3. Algorithm-Searching the closest large earthquake using Gauss curvature coordinates.

End Loop i and j [Step-II] Compare Distances from the Reference Earthquakes
Get ready K (k) where k ∈ the set of epoch numbers of known large earthquakes (i.e. 8 events in Table 1

Methods
Earthquake data homogenization within one-month epoch. Within one epoch (1 month), the spatial coordinates of every earthquake hypocenter available on the catalog are checked to determine which reference volume (i.e., a discretized volume in the lithosphere) the earthquake event belongs to. In particular, this study assumes that the hypocenter's spatial coordinates ( , φ, h) determine the associated reference volume. If an earthquake's hypocenter resides in the jth reference volume's domain ( j ± 0.1 • , φ j ± 0.1 • , h j ± 5 km ), it belongs to the reference volume. To some extent, this process homogenizes all the earthquake events within the reference volume in one epoch (1 month). All the impacts (e.g., the pseudo energy, power, vorticity, Laplacian, etc.) of the earthquakes within one reference volume in one epoch are homogenized and represented by the quantities at the center of the reference volume.
Generation of new features using spatio-temporal information convolution. The raw data of earthquake hypocenters are assumed to be distributed over the lithosphere domain and thus constitute a sort of irregular 3D point cloud (Fig. 1A,B). This point cloud of hypocenters is used for all the following new features using the spatial and temporal convolutions. The observed earthquake hypocenter data sets adopted herein 27   i (see 29 ). A point-wise information index (II) is denoted as "local" II, II local ∈ R[0, 1] and calculated as II local i . The local II maps real earthquake magnitudes to the range of [0,1). Figure S1 Fig. 1A,B. Fault zones are inherently multiscale 2 with a core being surrounded by the damaged zone of which macro-fractures decay with distance from the core 45 . Thus, an individual earthquake's impact may not be described by a point-wise index, requiring a comprehensive means to capture a spatial impact on the surrounding. Complex spatial influences of many earthquakes are accounted for by the spatial convolution presented herein. If convolution is done over a spatial domain, ML can better understand the interaction of spatially distributed information and hidden patterns while applied to the temporal domain, the interactions between past and present information may be elucidated. This study seeks to spatially integrate the local II over the 3D point cloud, i.e., myriad earthquake events in the lithosphere. The key difference from the deep learning is that this study "externalizes" the multi-layered convolutions by conducting multiple convolutions at the information level, not in the opaque network layers. Rather than a uniform integration, we adopt a weighted integration using the Gaussian weight function (denoted ω ) to realize the proximity-proportionate importance of information. This process generates the "convolved spatial II;; denoted as II (t) S . Figure S2C illustrates the derivation of the convolved spatial II. The physical meaning of the convolved spatial II is that II (t) S (ξ j ; L k ) quantifies how much the jth reference volume experiences earthquakes during one epoch (t) while the closer events the higher impact on the volume. The "reference volume" is defined as a discretized volume in the lithosphere with fixed spatial coordinate which is needed for spatial and temporal convolution (see details in "Methods" section and 29 ). This study's reference volume has dimensions of (0.1 deg, 0.1 deg, 5 km) due to the limit of computational resources. For a higher resolution, a finer reference volume may be adopted. If the earthquakes during the epoch took place nearby (i.e. within or close to the L k ) the failure directly affects the jth reference volume whereas earthquakes occurred at distance (i.e. much larger than L k ), the reduced impact is recorded in the jth reference volume via the II (t) S (ξ j ; L k ) . This is a time-dependent quantity and thus defined at an epoch (t) and calculated as ; ξ j stands for the position vector of the center of jth reference volume and ∀x (t) i ∈ V , and V means the entire lithosphere domain under consideration. In this research, the 3D lithosphere domain is discretized into reference volumes-a fixed Eulerian 3D grid system. Thus, Eq. (9) is calculated by the discrete summation in lieu of continuous integration, of which details are presented in 29 . In Eq. (9), L k ∈ R + , k = 1, . . . , n L stands for the radius of influence range. With a larger value of L k , the earthquake events across a broad space can be incorporated at the expense of over-smoothing effect; with a smaller L k , higher priority on the adjacent earthquakes to the current reference volume at the expense of local spikes or over-fitting effect. For the weighting function, there is no restriction to the use of other weightings. The dimension parameter N = 3 is used for the spatial convolution over the 3D point cloud whereas N = 1 is used for the temporal convolution over time which shall be explained later. Fig. S2 shows three cases of the convolved spatial II with different influence ranges. All three cases used the procedures given in Eq. (9). Still, with a larger L > 50 km such as Fig.S2C, theover-smoothing effect is notable. In heterogeneous materials or composite structures, this spatial convolved II may help ML understand internal complexity as scientistsdo 32-34,46,] . www.nature.com/scientificreports/ A reference volume in the lithosphere experiences incessantly many events over time. By extending convolution to the time domain, we can incorporate such transient information about how one reference volume has been being affected by past earthquakes. Performing convolution over time creates "convolved spatio-temporal II" (denoted as II (t) ST ). This convolved spatio-temporal II accounts for all the past earthquakes up to the present epoch t. Figure 1D illustrates the calculation procedure of the spatio-temporal II. The one-dimensional ( N = 1 ) Gaussian weighting is used, being centered at the present time t. Being not certain about the optimal temporal influence ranges, here we allow in total n T temporal influence ranges, denoted by T l , l = 1, . . . , n T . For a temporal influence range T l , we have ; τ = |t − t past |, t ≥ t past , meaning the time gap between the current and the past time, all given in [epoch]. This convolved spatio-temporal II is calculated at the jth reference volume center, ξ j . As the 3D lithosphere domain is discretized into a fixed Eulerian grid system of reference volumes, the continuous time space is discretized into non-overlapping epochs (one month each). Like the spatial convolved information Eq. (9), the spatio-temporal convolved information Eq. (10) is calculated by the discrete summation, of which details are presented in 29 .
As expected, the convolved spatio-temporal II appears to successfully quantify earthquake events (Fig. S5) and effectively distinguish the low and high seismic activities. The required numerical normalization and proofs of upper and lower bounds of the convolved IIs are provided in 29 .

Pseudo released energy in terms of the convolved information.
Earthquakes leave behind a footprint on energy 2,45 . The pseudo released energy (denoted as E (t) r (ξ j ) ∈ R + ) of the jth reference volume at current time t may be represented in terms of the convolved spatio-temporal IIs. It should be noted that this paper does not adopt the well-known laws since the present goal is to pursue purely data-driven learning. Owing to the accumulated influences of adjacent earthquakes over time, it is plausible to consider that the pseudo released energy at a reference volume is increasing. Thus, the simple exponential form is preferred for the unknown expression of the pseudo released energy ( E r ) in terms of the convolved spatio-temporal IIs ( II ST ). To identify the hidden expression of E r (II ST ) , this paper leverages the Bayesian evolutionary algorithm in which an advanced evolutionary algorithm searches the vast space of parameters of the expression while the Bayesian update enables the probabilistic distributions of parameters to evolve as training proceeds with new data (see section "Combination of the Bayesian update and evolution algorithm" in 29 ). Amongst many possible combination operations (e.g., + or × ), the additive operation is found to be favorable. The best-so-far expression of the pseudo released energy identified by the Bayesian evolutionary algorithm is given by where θ (k,l) is the best-so-far free parameters of the associated link function L (k,l) . Role of the link function L (k,l) is to transform the convolved spatio-temporal information index II (t) ST (ξ j ; L k , T l ) into a smooth, nonlinear output-here the pseudo released energy. In spirit, Eq. (11) resembles multiple layers of deep neural network as a nonlinear transformation route. Example plots using the exponential link functions (LFs) with the additive combination are shown in Fig. S7. The flexibility and expressibility of the LFs are explained in "Methods" and 29 . And the best combination of the spatial and temporal influence ranges is identified as L k = (10, 25) [km] and T l = (3, 6) [epoch = month]. This combination of short-and long-range influence ranges appears to outperform the other rules with a single L or T or many L's and T's. As shown in Fig. S8, the relative contribution of different influence ranges appears complicated but interpretable. In the higher II ranges ( II ST > 0.8), the spatio-temporal II with (L 1 , T 2 ) = (10 km, 6 epochs) and (L 2 , T 1 ) = (25 km, 3 epochs) to the pseudo released energy are significant (see Figs. S8B-C). In contrast, the contribution of II with (L 1 , T 1 ) = (10 km, 3 epochs) are uniform regardless of II ST and thus important in the low and mid ranges of II ( II ST < 0.8; Fig. S8A). Although this identified rule of the pseudo released energy may not be close to the "exact" one, the clear interpretability of the pseudo released energy's expression is still meaningful, conveying physically sound implications. For instance, Fig. S8A implies that nearly all the earthquakes in close distance and recent time retain their influence. Contrarily, Figs. S8B-C imply that only larger earthquakes ( II ST >0.8) retain influence because they are far away or old enough to allow post-earthquake curing.
Pseudo power, vorticity and Laplacian of the pseudo released energy. Other important physics quantities would be the spatial gradients of the pseudo released energy over the lithosphere and "power. " The time derivative of energy is physically related to the power. The calculation procedure of the time derivatives of the energy-related terms is presented in 29 . Figs. S9 and S10 present example plots of the spatial gradients and time derivative of the pseudo released energy at depth 2.5 km and 12.5 km, respectively. These plots are with respect to the earth-centered coordinate system before transformation to geodetic coordinates. The spatial gradient with respect to the geocentric coordinate system may convey weak physical and geometrical information in view of the curved structure of the earth lithosphere. Thus, it is meaningful to transform the geocentric gradient (denoted as ∇E (t) r ) to the geodetic gradient (denoted as ∇ g E (t) r ), i.e., the spatial gradient with respect to the geo- r (ξ j ) . Fig. S11 shows example plots of the gradient field vector at depth z = 12.5 km with respect to the geodetic coordinate system. By observing the transient change of the spatial gradient of the pseudo released energy, this study derives the pseudo "vorticity" ω = (ω , ω φ , ω h ) as In Eq. (12), ∇ g = (∂/∂ ; ∂/∂φ, ∂/∂h);"× " is the curl operator; . Fig. S12 presents example plots of the calculated vorticity vector. The vorticity of the pseudo released energy flow is considered as another physics quantity since the vorticity may hint at the temporal rotation of the strain energy field which may play an important role in rupture initiation. There is no direct definition of the velocity field needed for vorticity calculation, and thus the spatial gradient of the time derivative of the pseudo released energy ( ∇ g ) is regarded as a "pseudo velocity" in Eq. (12). Physically, this pseudo velocity field may describe the spatial distribution of how the pseudo released energy is changing over time. Although the time increment is large (here, one month) compared to the mathematical derivative, the slow motion of the earth plate (e.g., 8-10 cm/year 21 ) may justify the use of such a large time interval for the pseudo velocity.
This study found that the higher-order gradient of the pseudo released energy also helps improve the prediction accuracy. In fact, such an important role of higher-order gradients can be easily found in many physics phenomena. For instance, in the heat transfer, the Laplacian of the externally observed temperature (T) is directly related to the temporal evolution of the internal heat energy, i.e., ∂T/∂t = κ∇ 2 T where κ is the thermal diffusivity. This study calculates the "pseudo Laplacian" as In addition to the resultant Laplacian in Eq. (14), this study compared individual term's impact on prediction performance, i.e., prediction with The comparative study found that the inclusion of the separate term of Calculation of the Gauss curvatures of the pseudo physics quantities. By regarding the distribution of a physics quantity at a certain depth as a smooth surface, this study can calculate the Gauss curvature at the maximum earthquake events. At a fixed depth h, let u stand for and v for φ . Let Z(u, v, h) stand for a physics quantity calculated at the depth h and the horizontal coordinates (u, v). Z can be any pseudo physics quantities such as the pseudo released energy, power, vorticity, and the Laplacian terms. Then, we can conceive a smooth surface X(u, v, Z) at depth h. According to geometry, we can call X as a map X : → E 3 where is some open subset of plane (u, v) ∈ R 2 and E 3 is the Euclidean space where the standard inner product holds. And we assume X to be C 3 -continuous, i.e., all the partial derivatives up to the 3rd order exist and are continuous. This may be assured by the smoothness of all the pseudo physics quantities endowed by the spatio-temporal convolution process, and a rigorous mathematical proof is deferred to future extension. It is well known [e.g., 47 ] that the (total) Gauss curvature K is defined with the principal curvatures κ 1 and κ 2 as K := κ 1 κ 2 of which detailed calculation procedures are presented in 29 . Using these Gauss curvature-based coordinates, the unique signature of individual large earthquake can be found by an exhaustive searching algorithm in Table 3.

Reference volumes in the Earth lithosphere domain.
This study defines reference volumes of the given domain in the Earth lithosphere. To generate 4D convolved spatio-temporal information index (II) by performing spatial and temporal convolutions, it is efficient to define a fixed location in the space and time, which is the central reason for adopting the Eulerian 3D grid system of reference volumes. Admittedly the Earth lithosphere is not a simple spherical structure, and the earthquake hypocenters are often recorded on longitude, latitude, and depth, ( , φ, h) (t) i , i = 1, . . . , n (t) . Therefore, this study processes raw data to the earth-centered 3D coordinates, (x, y, z)   (12)  Reference Volumes: Given the ranges of longitudes, latitudes, and depths, this study defines uniformly distributed grid system, and each cell is denoted as a reference volume. The total number of reference volumes n rv is simply calculated by n rv = n × n φ × n h where n = | max − min |/� , n φ = |φ max − φ min |/�φ , and n h = |h max − h min |/�h . Here (� , �φ, �h) are user-defined increments of longitude, latitude, and depth, respectively. The index of reference volume is ordered by , φ, and h. Thus, the coordinates or the jth reference volume's center, denoted as ξ j ∈ R 3 , is represented by  (15)(16)(17). Whenever using Eq. (17), the outward normal is used for the positive sign of the depth. Another important quantity about the reference volume is the actual volume of individual reference volume element. In view of the curved ellipsoidal lithosphere, the volume of the jth reference volume element V j [ km 3 ] is calculated by Computational implementation of proposed algorithms. The spatio-temporal convolution of earthquake catalog data to generate the convolved II is computationally expensive. This study developed a parallelized Bayesian evolutionary algorithm framework with C++ and OpenMPI. All other learning, evolutionary algorithm and Bayesian update scheme are implemented on the parallel program. The developed program is made available upon request to the author. Iowa State University's high-performance computing facility, Condo cluster is used for this study.
Flexible and transparent link functions. Placing top priority on the interpretability, this study proposes to adopt an expressive link function (LF) using transparent, flexible basis that can describe a mathematical expression between the convolved spatio-temporal II, II ST and the hidden physical rules. LF is denoted as L(II ST ; θ) where θ is a set of free parameters prescribing the LF. This study used an evolutionary algorithm coupled with the Bayesian update scheme to enable LF to continue to learn, train, and evolve. There is little restriction of choice of other forms of LFs. For balancing the efficiency and interpretability, one may choose the cubic regression spline (CRS)-based LF with high flexibility 40,49 or two-parameter based exponential LF with its simplicity. First, the CRS-based LF has a general form as L (k,l) (II where θ (k,l) = {a, x * } (k,l) with a (k,l) = {a 1 , . . . , a p } (k,l) , the knots x * (k,l) = {x * 1 , . . . , x * (p−2) } (k,l) , and the cubic spline basis b (k,l) i given in Eq. (33) in 29 . Next, the two-parameter exponential LF has a simpler form as L (k,l) (II , b (k,l) ; k = 1, . . . , n L , l = 1, . . . , n T } , and "-1" is to make the minimum of the LF near zero. It should be noted that the exponential LF is always nonzero, positive, and monotonically increasing while preserving the concave or convex shape (see Fig. S6 in 29 ).

Data availability
The processed 40-years data sets consisting of the month-based epochs and the refined day-based epochs are shared on a cloud storage 28 . Other supplementary data and parallel programs supporting other findings of this paper will be available upon request to the author.  (15) x = a 2 a 2 cos 2 φ + b 2 sin 2 φ + h cosφ cos (16) y = a 2 a 2 cos 2 φ + b 2 sin 2 + h cosφ sin